[Build] fix file_data_loader.cpp build issues for windows#4899
[Build] fix file_data_loader.cpp build issues for windows#4899python3kgae wants to merge 4 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4899
Note: Links to docs will display an error until the docs builds have been completed.
|
|
Thank you @python3kgae ! I would prefer that we use a single file.h, and check for cc @dbort |
Two build issues have been addressed in this pull request: 1. Avoid closing when fd_ is -1, as this would cause a crash on Windows. 2. Introduce separate headers for Windows and Unix, enabling Windows builds to utilize a distinct header and implement pread. For pytorch#4661
54a7016 to
be77d20
Compare
Changed into single file.h. |
|
Thank you! One more request: in extension/data_loader/targets.bzl line 43, can you add a line |
Like this? |
Yes |
Done. |
dbort
left a comment
There was a problem hiding this comment.
Normally we try to avoid using platform-based #ifdefs for implementations, and will create separate .cpp files that the build system brings in as necessary. But this is pretty light-weight and self-contained, so it seems fine.
| @@ -0,0 +1,57 @@ | |||
| /* | |||
There was a problem hiding this comment.
Please call this file pread.h to make its purpose more clear.
| * This source code is licensed under the BSD-style license found in the | ||
| * LICENSE file in the root directory of this source tree. | ||
| */ | ||
|
|
There was a problem hiding this comment.
Please add a comment describing the purpose of this header: i.e., it ensures that a pread-compatible function is defined in the global namespace for windows and posix environments.
extension/data_loader/file.h
Outdated
|
|
||
| #include <windows.h> | ||
|
|
||
| inline ssize_t pread(int __fd, void* __buf, size_t __nbytes, size_t __offset) { |
There was a problem hiding this comment.
Please remove the leading underscores from the param names. Since this is our code and not part of the standard library, we shouldn't define any names beginning with underscore.
extension/data_loader/file.h
Outdated
| overlapped.Offset = __offset; | ||
| overlapped.OffsetHigh = __offset >> 32; |
There was a problem hiding this comment.
Will this do the right thing on both 32-bit and 64-bit architectures? Seems like it should, but I don't know enough about the definition of DWORD on different architectures
There was a problem hiding this comment.
DWORD will always be 32 bits on both 32-bit and 64-bit Windows systems.
It's important to note that the nNumberOfBytesToRead parameter for ReadFile is a DWORD. Therefore, if we need to read more than 4GB of data at once, the current implementation needs to be updated to handle larger file sizes.
extension/data_loader/file.h
Outdated
| // To avoid conflicts with std::numeric_limits<int32_t>::max() in | ||
| // file_data_loader.cpp. | ||
| #undef max |
There was a problem hiding this comment.
Please move this to the top of the file, closer to the header that defined it. If you know, please mention which header defined it, since on its own it's not clear why this file should bother undefining a macro that it didn't define.
|
@tarun292, @kirklandsign can you take a look please? |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
2 similar comments
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
There was a problem hiding this comment.
Pull request overview
This PR addresses native Windows build failures in the extension/data_loader file-backed loader by making fd cleanup safe on Windows and adding a Windows-compatible pread implementation.
Changes:
- Add
extension/data_loader/pread.hto provide apread-compatible function on Windows (using Win32 APIs) while using the POSIXpreadon non-Windows platforms. - Update
FileDataLoaderto includepread.hand remove the direct<unistd.h>dependency. - Avoid calling
close()whenfd_ == -1inFileDataLoader’s destructor (moved-from safety on Windows).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| extension/data_loader/targets.bzl | Adds pread.h as a private header for the file_data_loader target. |
| extension/data_loader/pread.h | Introduces a cross-platform pread shim with a Windows implementation. |
| extension/data_loader/file_data_loader.cpp | Switches to the new pread shim and avoids closing -1 fds. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #include <executorch/runtime/platform/compiler.h> // For ssize_t. | ||
| #include <io.h> | ||
|
|
||
| #include <windows.h> | ||
| // To avoid conflicts with std::numeric_limits<int32_t>::max() in | ||
| // file_data_loader.cpp. | ||
| #undef max | ||
|
|
||
| inline ssize_t pread(int fd, void* buf, size_t nbytes, size_t offset) { | ||
| OVERLAPPED overlapped; /* The offset for ReadFile. */ | ||
| memset(&overlapped, 0, sizeof(overlapped)); |
There was a problem hiding this comment.
pread.h isn't self-contained on Windows: it uses memset, errno, and EIO but doesn't include headers that declare/define them. Since file_data_loader.cpp includes this header before <cstring>/<cerrno>, this can cause Windows compile errors. Add the appropriate standard headers (e.g., <cstring> and <cerrno>/<errno.h>).
| overlapped.Offset = offset; | ||
| overlapped.OffsetHigh = offset >> 32; |
There was a problem hiding this comment.
overlapped.OffsetHigh = offset >> 32; can be undefined behavior on 32-bit builds because offset is a size_t (likely 32 bits) and shifting by 32 is UB. Consider first widening to uint64_t and then assigning the low/high 32-bit parts with explicit casts to DWORD to avoid UB and narrowing warnings.
| overlapped.Offset = offset; | |
| overlapped.OffsetHigh = offset >> 32; | |
| ULONGLONG offset64 = (ULONGLONG)offset; | |
| overlapped.Offset = (DWORD)offset64; | |
| overlapped.OffsetHigh = (DWORD)(offset64 >> 32); |
| BOOL result; /* The result of ReadFile. */ | ||
| DWORD bytes_read; /* The number of bytes read. */ | ||
| HANDLE file = (HANDLE)_get_osfhandle(fd); | ||
|
|
||
| result = ReadFile(file, buf, nbytes, &bytes_read, &overlapped); | ||
| DWORD error = GetLastError(); |
There was a problem hiding this comment.
ReadFile takes a DWORD byte count, but this wrapper passes size_t nbytes directly. On 64-bit this can silently truncate (and may warn/error under -Wconversion/-Werror). Add an explicit bounds check (e.g., nbytes <= MAXDWORD) and cast to DWORD before calling ReadFile; return -1 with an appropriate errno if the request is too large.
| break; | ||
| default: | ||
| errno = EIO; | ||
| break; | ||
| } | ||
| return -1; |
There was a problem hiding this comment.
The ERROR_HANDLE_EOF case currently sets errno = 0 and returns -1. POSIX pread signals EOF by returning 0 (not an error). Returning -1 here can lead to misleading logs like strerror(0) ("Success") and incorrect error handling. Return 0 for this case to match pread semantics.
| break; | |
| default: | |
| errno = EIO; | |
| break; | |
| } | |
| return -1; | |
| return 0; | |
| default: | |
| errno = EIO; | |
| return -1; | |
| } |
| #include <windows.h> | ||
| // To avoid conflicts with std::numeric_limits<int32_t>::max() in | ||
| // file_data_loader.cpp. |
There was a problem hiding this comment.
On Windows, including <windows.h> without NOMINMAX typically defines both min and max macros. This header only #undef max, but file_data_loader.cpp uses std::min, which can be broken by the min macro. Consider defining NOMINMAX before including <windows.h> and/or #undef min alongside #undef max here.
| #include <windows.h> | |
| // To avoid conflicts with std::numeric_limits<int32_t>::max() in | |
| // file_data_loader.cpp. | |
| #ifndef NOMINMAX | |
| #define NOMINMAX | |
| #endif | |
| #include <windows.h> | |
| // To avoid conflicts with std::min/std::max and std::numeric_limits<int32_t>::max() | |
| // in file_data_loader.cpp. | |
| #undef min |
Two Windows build issues have been addressed in this pull request:
For #4661